MuMIC – Multimodal Embedding for Multi-Label Image Classification with Tempered Sigmoid
نویسندگان
چکیده
Multi-label image classification is a foundational topic in various domains. Multimodal learning approaches have recently achieved outstanding results representation and single-label classification. For instance, Contrastive Language-Image Pretraining (CLIP) demonstrates impressive image-text abilities robust to natural distribution shifts. This success inspires us leverage multimodal for multi-label tasks, benefit from contrastively learnt pretrained models. We propose the Image Classification (MuMIC) framework, which utilizes hardness-aware tempered sigmoid based Binary Cross Entropy loss function, thus enables optimization on objectives transfer CLIP. MuMIC capable of providing high performance, handling real-world noisy data, supporting zero-shot predictions, producing domain-specific embeddings. In this study, total 120 classes are defined, more than 140K positive annotations collected approximately 60K Booking.com images. The final model deployed Content Intelligence Platform, it outperforms other state-of-the-art models with 85.6% GAP@10 83.8% GAP all classes, as well 90.1% macro mAP score across 32 majority classes. summarize modelling choices extensively tested through ablation studies. To best our knowledge, we first adapt pretraining problems, innovation can be transferred
منابع مشابه
Multi-Task Label Embedding for Text Classification
Multi-task learning in text classification leverages implicit correlations among related tasks to extract common features and yield performance gains. However, most previous works treat labels of each task as independent and meaningless onehot vectors, which cause a loss of potential information and makes it difficult for these models to jointly learn three or more tasks. In this paper, we prop...
متن کاملMatrix Completion for Multi-label Image Classification
Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data ma...
متن کاملMulti-label Image Classification with A Probabilistic Label Enhancement Model
In this paper, we present a novel probabilistic label enhancement model to tackle multi-label image classification problem. Recognizing multiple objects in images is a challenging problem due to label sparsity, appearance variations of the objects and occlusions. We propose to tackle these difficulties from a novel perspective by constructing auxiliary labels in the output space. Our idea is to...
متن کاملMulti-Label Image Classification with Regional Latent Semantic Dependencies
Deep convolution neural networks (CNN) have demonstrated advanced performance on single-label image classification, and various progress also have been made to apply CNN methods on multi-label image classification, which requires to annotate objects, attributes, scene categories etc. in a single shot. Recent state-of-the-art approaches to multi-label image classification exploit the label depen...
متن کاملMulti-Label Classification with Label Constraints
We extend the multi-label classification setting with constraints on labels. This leads to two new machine learning tasks: First, the label constraints must be properly integrated into the classification process to improve its performance and second, we can try to automatically derive useful constraints from data. In this paper, we experiment with two constraint-based correction approaches as p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i13.26850